Fixing bad graphs: The Economist

“From the Medium artile, ‘Mistakes, we’ve drawn a few’’”

Background

We’re going to re-create the graphs below from the Medium article, ‘Mistakes, we’ve drawn a few’. The author of this article claims the mistake with this particular graph below:

Mistake: Choosing the wrong visualisation method

Let’s take a closer look at the original graph with some of her comments:

We published this polling chart in Espresso, our daily news app. It shows attitudes to the outcome of the EU referendum, plotted as a line chart. Looking at the data, it appears as if respondents had a rather erratic view of the referendum result — increasing and decreasing by a couple of percentage points from one week to the next.

Looking at the data, it appears as if respondents had a rather erratic view of the referendum result — increasing and decreasing by a couple of percentage points from one week to the next.

Outline

We’re going to:

  1. Inspect the data from the article
  2. Build one visualization per dataset
  3. Re-create the original graph
  4. Create a ‘better’ graph

Packages

Below we load the necessary packages:

# install.packages(c("tidyverse", "janitor", "skimr", "fs", "ggthemes"))
library(tidyverse)
library(janitor)
library(skimr)
library(fs)
library(ggthemes)

Data

There are four data files for this article, which we import from the course GitHub repo below:

# import data
Balance <- readr::read_csv("https://bit.ly/brexit-balance")
Corbyn <- readr::read_csv("https://bit.ly/brexit-corbyn")
Pensions <- readr::read_csv("https://bit.ly/brexit-pensions")
Brexit <- readr::read_csv("https://bit.ly/brexit-brexit")

Balance data

The first dataset we’ll investigate is the Balance data:

glimpse(Balance)
Rows: 266
Columns: 4
$ country      <chr> "Belgium", "Germany", "Estonia", "Ireland", "Gr…
$ account_type <chr> "current", "current", "current", "current", "cu…
$ year         <dbl> 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,…
$ value        <dbl> -3755.0, 141234.0, 360.0, -7912.1, -29323.0, -4…

We can see this data contains values per country in the EU, grouped by year and some kind of account_type. These clearly aren’t the data from the original and better graphs above, so we won’t be diving deeper into each variable.

But we can use the Balance dataset to create a column-graph of value (on the x) and country (on the y).

Create sign variable

Because value contains both positive and negative numbers, we’ll create an identifier for the sign.

Balance %>%
  mutate(sign = 
      if_else(condition = value > 0, 
        true = "Positive Balance", 
        false = "Negative Balance"))
# A tibble: 266 × 5
   country account_type  year   value sign            
   <chr>   <chr>        <dbl>   <dbl> <chr>           
 1 Belgium current       2009  -3755  Negative Balance
 2 Germany current       2009 141234  Positive Balance
 3 Estonia current       2009    360  Positive Balance
 4 Ireland current       2009  -7912. Negative Balance
 5 Greece  current       2009 -29323  Negative Balance
 6 Spain   current       2009 -46191  Negative Balance
 7 France  current       2009 -10652  Negative Balance
 8 Italy   current       2009 -29717  Negative Balance
 9 Cyprus  current       2009  -1431  Negative Balance
10 Latvia  current       2009   1463  Positive Balance
# … with 256 more rows

A great way to check your work when creating new categorical variables is passing the output to janitor::tabyl(). In this case, we want to see the countries broken down by our new sign variable.

Balance %>%
  mutate(sign = 
      if_else(condition = value > 0, 
        true = "Positive Balance", 
        false = "Negative Balance")) %>% 
  janitor::tabyl(country, sign)
     country Negative Balance Positive Balance
     Austria                7                7
     Belgium               13                1
      Cyprus               14                0
     Estonia                4               10
     Finland               12                2
      France               14                0
     Germany                5                9
      Greece               14                0
     Ireland               11                3
       Italy               11                3
      Latvia               12                2
   Lithuania               11                3
  Luxembourg                2               12
       Malta               11                3
 Netherlands                7                7
    Portugal               11                3
    Slovakia               11                3
    Slovenia                9                5
       Spain               11                3

Initialize graph

We’ll create a canvas (or initialized plot) with value on the x, country on the y, grouped and filled by sign.

balance_global_layer <- Balance %>%
  mutate(sign = 
      if_else(condition = value > 0, 
        true = "Positive Balance", 
        false = "Negative Balance")) %>% 
  ggplot2::ggplot(aes(
    x = value,
    y = country,
    group = sign,
    fill = sign
  ))
balance_global_layer

Now that we have a blank canvas, we’ll build a column-graph using geom_col(). We know we won’t need a legend, to we’ll set this to FALSE.

balance_global_layer +
  geom_col(show.legend = FALSE)

This graph is stacking the columns, which is the default position in geom_col(). We can also use "dodge" or "dodge2":

balance_global_layer +
  geom_col(position = "dodge", show.legend = FALSE)

balance_global_layer +
  geom_col(position = "dodge2", show.legend = FALSE)

We’ll stick with position = "dodge", and call this balance_geom_col_layer

balance_geom_col_layer <- balance_global_layer +
  geom_col(position = "dodge",
    show.legend = FALSE) 

We’ll add some color to denote the positive ("green4") and negative ("firebrick") values from sign, and fix the x axis with scale_x_continuous() and scales::comma.

This layer gets saved as balance_x_scale_layer.

balance_x_scale_layer <- balance_geom_col_layer +
  scale_fill_manual(values = c("firebrick", "green4")) + 
  scale_x_continuous(labels = scales::comma)
balance_x_scale_layer

We have to other categorical variables in the Balance data, account_type and year. Let’s check how these look with facet_wrap()

balance_x_scale_layer + 
  facet_wrap(~ account_type) 

balance_x_scale_layer + 
  facet_wrap(~ year) 

Both of these would work–it just depends on which category is best at illustrating our argument. We’ll arbitrarily choose the account_type for balance_facet_wrap_layer.

balance_facet_wrap_layer <- balance_x_scale_layer + 
  facet_wrap(~ account_type) 
balance_facet_wrap_layer

Finally–we can add a theme and labels. We’ll use ggplot2::theme_minimal().

balance_facet_wrap_layer +
  ggplot2::theme_minimal() + 
  labs(title = "Balance Data (from EU referendum)", 
    subtitle = "Country values (positive or negative) by account type")

Corbyn data

The Corbyn is below:

political_group avg_facebook_likes
Jeremy Corbyn 5210
Labour Party 845
Momentum 229
Owen Smith 127
Andy Burnham 105
Saving Labour 56

There isn’t much data in Corbyn, but we’ll look at the distribution of avg_facebook_likes by political_group using geom_col():

Corbyn %>%
  ggplot(aes(x = political_group, y = avg_facebook_likes)) + 
  geom_col(aes(fill = political_group))

This is more color categories than we’d like for a graph, but these aren’t the data we’re trying to correct, so we’ll move on.

Pensions

Below are the Pensions data:

glimpse(Pensions)
Rows: 35
Columns: 3
$ country               <chr> "Australia", "Austria", "Belgium", "Br…
$ pop_65_percent        <dbl> 15.04, 18.76, 18.22, 7.84, 16.14, 11.0…
$ gov_spend_percent_gdp <dbl> 5.20, 13.86, 10.36, 12.00, 4.31, 3.25,…

We can see a single categorical variable, country, and two percentages (pop_65_percent and gov_spend_percent_gdp).

We’ll use geom_boxplot() to graph the pop_65_percent and gov_spend_percent_gdp variables (note we use y = "" in aes() and in the labs() function to remove the y)

Pensions %>%
  # the variable
  ggplot(
    aes(
      x = pop_65_percent,
      y = "")) + 
  geom_boxplot() + 
  labs(y = "")

Pensions %>%
  # the variable
  ggplot2::ggplot(aes(
    x = gov_spend_percent_gdp,
    y = "")) + 
  geom_boxplot() + 
  labs(y = "")

Box-plots are handy if we have a categorical variable we want to view distributions across, too.

Re-leveling axes

If we wanted to show how pop_65_percent changes with the gov_spend_percent_gdp, we could get the top 5 countries with the gov_spend_percent_gdp, then order our y axis by the pop_65_percent, but set the size to gov_spend_percent_gdp (I’ve multiplied it by 10 here just to make the points a little larger):

Pensions %>% 
  arrange(desc(gov_spend_percent_gdp)) %>% 
  group_by(country) %>% 
  slice_max(gov_spend_percent_gdp) %>% 
  head(5) %>% 
  ggplot(
    aes(x = pop_65_percent,
        y = fct_reorder(.f = country, .x = pop_65_percent),
        color = country)) + 
  geom_point(aes(size = gov_spend_percent_gdp*10), show.legend = FALSE) + 
  labs(y = "Country", x = "Pop 65 percent")

Again–these aren’t the data we’re trying to fix, so we’ll move on.

Brexit data

The Brexit data are below:

glimpse(Brexit)
Rows: 85
Columns: 3
$ date                     <chr> "2/8/16", "9/8/16", "17/08/16", "23…
$ percent_responding_right <dbl> 46, 45, 46, 45, 47, 46, 45, 45, 46,…
$ percent_responding_wrong <dbl> 42, 44, 43, 43, 44, 43, 44, 44, 43,…

These are the data from the graph, so we’ll break each step down layer-by-layer.

Format date

First we need a properly formatted date

Brexit <- Brexit %>% mutate(date = lubridate::dmy(date))
glimpse(Brexit)
Rows: 85
Columns: 3
$ date                     <date> 2016-08-02, 2016-08-09, 2016-08-17…
$ percent_responding_right <dbl> 46, 45, 46, 45, 47, 46, 45, 45, 46,…
$ percent_responding_wrong <dbl> 42, 44, 43, 43, 44, 43, 44, 44, 43,…

Line-graphs

We’ll start with some line-graphs of date vs. percent_responding_right and percent_responding_wrong

Place date (on the x) and percent_responding_right (on the y)

Brexit %>%
  ggplot(aes(x = date,
    y = percent_responding_right)) + 
    geom_line(size = 2.7, color = "cornflowerblue")

Plot date (on the x) and percent_responding_wrong (on the y)

Brexit %>%
  ggplot(aes(x = date,
    y = percent_responding_wrong)) + 
    geom_line(size = 2.7, color = "firebrick")

We want both percent metrics in the same column, indexed on poll. The code below will assign the output from tidyr::pivot_longer() to TidyBrexit:

TidyBrexit <- Brexit %>%
  pivot_longer(-date,
    names_to = "poll",
    values_to = "percent"
  ) %>% 
  mutate(
    poll = str_replace_all(poll, "_", " ")
  )
glimpse(TidyBrexit)
Rows: 170
Columns: 3
$ date    <date> 2016-08-02, 2016-08-02, 2016-08-09, 2016-08-09, 201…
$ poll    <chr> "percent responding right", "percent responding wron…
$ percent <dbl> 46, 42, 45, 44, 46, 43, 45, 43, 47, 44, 46, 43, 45, …

Original line plot

Now we pass the pivoted data to geom_line() and view the results.

TidyBrexit %>%
  ggplot2::ggplot(aes(
    x = date, y = percent,
    group = poll, color = poll)) + 
  geom_line(size = 2.7) + 
  scale_color_manual(values = c("cornflowerblue", "firebrick3")) + 
  theme_minimal() + 
  labs(title = "Bremorse", 
    subtitle = "In hindsight, do you think Britain was right or wrong to leave the EU\n% responding", 
    x = "", y = "")

This is pretty close to the Original (minus the realigned y axis, text labels, and reformatted x axis).

Create ‘Better’ Brexit graph

Now we’ll create the ‘Better’ graph, and make some of our own ‘improvements.’

Map variables to aesthetics

Use TidyBrexit to build the ggp_brexit plot object layer with date on the x, percent on the y, and group assigned to poll:

ggp_brexit_global_layer <- TidyBrexit %>%
  ggplot(mapping = aes(x = date, y = percent, group = poll))
ggp_brexit_global_layer

Add the points with ggplot2::geom_point()

Add the points with ggplot2::geom_point(), but include the show.legend = FALSE, size = 2.5, and alpha = 1/3. Save this as ggp_brexit_point_layer

ggp_brexit_point_layer <- ggp_brexit_global_layer +
  geom_point(aes(color = poll), 
    show.legend = FALSE, size = 2.5, alpha = 1 / 3)
ggp_brexit_point_layer

Add the lines with geom_smooth()

Add the ggplot2::geom_smooth() to ggp_brexit_point_layer

ggp_brexit_point_layer + 
  ggplot2::geom_smooth()

Add the color aethetic to poll

ggp_brexit_point_layer + 
  ggplot2::geom_smooth(aes(color = poll))

Remove error with se = FALSE

Add the se = FALSE to remove the gray area around the lines, and remove the legend with show.legend = FALSE and create the ggp_brexit_smooth_layer object.

ggp_brexit_smooth_layer <- ggp_brexit_point_layer +
  geom_smooth(aes(color = poll), 
    se = FALSE, show.legend = FALSE)
ggp_brexit_smooth_layer

Add colors with scale_color_manual()

Add the colors cornflowerblue and firebrick3 to the ggp_brexit_smooth_layer layer and create a new layer called ggp_brexit_colors_layer.

ggp_brexit_colors_layer <- ggp_brexit_smooth_layer +
  scale_color_manual(values = c("cornflowerblue", "firebrick3"))
ggp_brexit_colors_layer

Add text

Use the ggplot2::geom_text() layer to add the text to the ggp_brexit_colors_layer layer, and name the new plot ggp_brexit_annotate_text_layer

ggp_brexit_annotate_text_layer <- ggp_brexit_colors_layer +
  annotate(geom = "text",
    label = "Wrong",
    size = 6,
    color = "firebrick3",
    x = as.Date("2018-01-01"), 
    y = 45.5
  ) +
  annotate(geom = "text",
    label = "Right", 
    size = 6,
    color = "cornflowerblue",
    x = as.Date("2018-01-01"),
    y = 42
  )
ggp_brexit_annotate_text_layer

Move the y axis

Move the y axis to the right side of the graph with ggplot2::scale_y_continuous()

ggp_brexit_y_scale_layer <- ggp_brexit_annotate_text_layer +
  ggplot2::scale_y_continuous(position = "right")
ggp_brexit_y_scale_layer

Add labels

Use the ggplot2::labs() function to create the labels layer.

labs_eco <- labs(
  title = "Bremorse", 
  subtitle = "In hindsight, do you think Britain was right or wrong to leave the EU\n% responding", 
  x = "", y = "")

Add the labels layer to ggp_brexit_y_scale_layer and create a new plot titled, ggp_brexit_labs.

ggp_brexit_labs <- ggp_brexit_y_scale_layer + labs_eco
ggp_brexit_labs

Add theme

# install.packages("ggthemes")
library(ggthemes)

Add the ggthemes::theme_economist_white() layer and set the following arguments:

ggp_brexit_labs +
  ggthemes::theme_economist_white(
    gray_bg = FALSE,
    base_size = 12,
    base_family = "Verdana"
  )